Hierarchical Control and Learning for Markov Decision Processes Abstract Hierarchical Control and Learning for Markov Decision Processes
نویسنده
چکیده
This dissertation investigates the use of hierarchy and problem decomposition as a means of solving large, stochastic, sequential decision problems. These problems are framed as Markov decision problems (MDPs). The new technical content of this dissertation begins with a discussion of the concept of temporal abstraction. Temporal abstraction is shown to be equivalent to the transformation of a policy deened over a region of an MDP to an action in a semi-Markov decision problem (SMDP). Several algorithms are presented for performing this transformation eeciently. This dissertation introduces the HAM method for generating hierarchical, temporally abstract actions. This method permits the partial speciication of abstract actions in a way that corresponds to an abstract plan or strategy. Abstract actions speciied as HAMs can be optimally reened for new tasks by solving a reduced SMDP. The formal results show that traditional MDP algorithms can be used to optimally reene HAMs for new tasks. This can be achieved in much less time than it would take to learn a new policy for the task from scratch. HAMs complement some novel decomposition algorithms that are presented in this dissertation. These algorithms work by constructing a cache of policies for diierent regions of the MDP and then optimally combining the cached solution to produce a global solution that is within provable bounds of the optimal solution. Together, the methods developed in this dissertation provide important tools for 2 producing good policies for large MDPs. Unlike some ad-hoc methods, these methods provide strong formal guarantees. They use prior knowledge in a principled way, and they reduce larger MDPs into smaller ones while maintaining a well-deened relationship between the smaller problem and the larger problem.
منابع مشابه
Accelerated decomposition techniques for large discounted Markov decision processes
Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...
متن کاملUtilizing Generalized Learning Automata for Finding Optimal Policies in MMDPs
Multi agent Markov decision processes (MMDPs), as the generalization of Markov decision processes to the multi agent case, have long been used for modeling multi agent system and are used as a suitable framework for Multi agent Reinforcement Learning. In this paper, a generalized learning automata based algorithm for finding optimal policies in MMDP is proposed. In the proposed algorithm, MMDP ...
متن کاملErrata Preface Recent Advances in Hierarchical Reinforcement Learning
Decision Making, Guest Edited by Xi-Ren Cao. The Publisher offers an apology for printing an incorrect version of the paper in the special issue and renders this paper as the true and correct paper. Abstract. Reinforcement learning is bedeviled by the curse of dimensionality: the number of parameters to be learned grows exponentially with the size of any compact encoding of a state. Recent atte...
متن کاملInference strategies for solving semi-Markov decision processes
Semi-Markov decision processes are used to formulate many control problems and also play a key role in hierarchical reinforcement learning. In this chapter we show how to translate the decision making problem into a form that can instead be solved by inference and learning techniques. In particular, we will establish a formal connection between planning in semiMarkov decision processes and infe...
متن کاملTree Based Hierarchical Reinforcement Learning
In this thesis we investigate methods for speeding up automatic control algorithms. Specifically, we provide new abstraction techniques for Reinforcement Learning and Semi-Markov Decision Processes (SMDPs). We introduce the use of policies as temporally abstract actions. This is different from previous definitions of temporally abstract actions as we do not have termination criteria. We provide...
متن کامل